Method and system for function-specific time-configurable replication of data manipulating functions

ABSTRACT

The system ( 10 ) and method ( 100 ) of the invention provides for function-specific replication of data manipulating functions ( 12 ) performed on data, such as files or objects, with a configurable time delay ( 14 ) for each function to be replicated. The system ( 10 ) and method ( 100 ) includes a replication management module ( 40 ) for managing the consistent function specific replication of data manipulating functions ( 12 ) with a function-specific delay ( 14 ) between a source storage system(s) ( 20, 65 ) and a destination storage system(s) ( 30, 75 ) and optionally includes a replication monitoring database ( 42 ).

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part of U.S. patent applicationSer. No. 12/140,296, of the same title, filed Jun. 17, 2008, which is acontinuation-in-part of U.S. patent application Ser. No. 11/939,633, ofthe same title, filed Nov. 14, 2007, which claims priority to USprovisional application No. 60/949,357, of the same title, filed Jul.12, 2007, the contents of which are incorporated by reference hereto.

FIELD OF THE INVENTION

The present invention relates generally to storage systems, which areable to store digital objects or files. More specifically, the presentinvention relates to data replication systems and methods.

BACKGROUND OF THE INVENTION

Several storage systems provide data replication capabilities for thepurpose of either logical error recovery or disaster tolerance, whichrequires respectively high availability and relatively high integrity.Storage systems allow block, object or file access and provide a meansto replicate data from source data storage to a backup data storagesystem. The method and system for function-specific time-configurablereplication of data manipulating functions applies to storage systemsallowing object and file access only.

Object-based storage arrays allow applications to integrate a set ofcommands, typically called an Application Programming Interface (API).The API allows the creation of new objects as well as the modificationof existing objects. For Storage arrays that are also providingWrite-Once-Read-Many (WORM) functionality, it may not be possible tomodify already stored objects. Deletion of objects is possible and incase of WORM storage arrays, deletion is prevented before the specifiedretention time has expired.

File-oriented storage arrays provide users or applications thepossibility of accessing the system using a file-share. These storagesystems provide access to the installed capacity using standard filesharing protocols like NFS (meaning Network File System) or CIFS(meaning Common Internet File System). These protocols may also haveproprietary extensions to implement special functionality like WORM filesystems or WORM shares.

The storage array may also be a standard server running an operatingsystem available from one of the many providers of operating systems.The server would provide access to the available capacity using fileshares similar to a file-oriented storage array.

The set of data manipulation functions for object or file orientedstorage arrays usually contains functions like write, delete, update,write-disable until expiration date or delete-disable before expirationdate. The exact implementation however is dependent on the storagearray. Each individual function on a storage array is described in thearray specific documentation. If the storage array provides specialfunctions that are not standardized in the protocols like NFS and CIFSthe array vendor provides a detailed description of the requiredintegration with the storage array.

Existing object or file oriented storage arrays already provide ways toreplicate data between two or more storage arrays. The replication maybe implemented on the storage array or on a dedicated system thatperforms the replication of data.

Existing systems also allow replicating changes to the target system.The replication may include or exclude specific functions. If a functionis replicated, it is generally replicated as soon as possible.

The changes made to objects or file systems are made by the users orapplications making these changes. Users may typically access fileoriented storage systems and perform the normal operations like writes,reads, updates or deletes of files. Applications may access both objectand/or file oriented storage arrays. As applications are programmed,they may implement rules to make data read-only up to a certainexpiration date. The capability to generate new versions of documentsand other advanced functionality exist in various solutions available onthe market. Among these advanced storage array functionalities in theprior art are applications which also use WORM functionality on storagearrays.

Data replication functionalities of current replication systems arebased on fixed, pre-established and non-configurable delays.Consequently, deletion of data that is referred to by otherwisenon-deleted files, objects or applications prevents recovery of suchdata.

U.S. Pat. No. 6,260,125 to McDowell, the content of which isincorporated herein by reference thereto, discloses an asynchronous diskmirroring system for use within a network computer system, wherein awrite queue operates to delay the time of receipt of write requests tostorage volumes, with a view to increasing data replication performance.The write queues include several write buffers, wherein the writerequests pass through the write queue in a first-in, first-out (FIFO)sequence; and so transmission of write requests may be subject to atime-delay by either a pre-determined amount of time or when the storageor write buffer is full. McDowell also discloses a log file configuredto receive the delayed write requests, for log-based mirrorreconstruction and check-pointing of the mirrored volumes. Thereplication of data by the system of McDowell is limited to updating andwriting and does not provide function-dependant data replication, nordoes it provide configurable replication of data manipulating functionssuch as delete or write-disable.

Patent application number WO 99/507/747 to Arnon, the content of whichis incorporated herein by reference thereto, discloses a method andapparatus for asynchronously updating a mirror of data from a sourcedevice, whose purpose is to prevent the overwriting of data on a sourcestorage that has not yet been committed to a target storage system. TheArnon method and apparatus addresses the need for data integrity butdoes not allow a user to configure replication operations on a functionbase or time base, and only prevents overwrite of data on a sourcestorage in the situation where data has not been replicated on targetstorage.

User-controlled data replication of the prior art allows users tocontrol whether replication occurs, but not when it occurs. A systemdesigned by Denehy et al. (Bridging the Information Gap in StorageProtocol Stacks, Denehy and al., Proceedings of the general track, 2002,USENIX annual technical conference, USENIX Association, Berkeley Calif.,USA, the content of which is incorporated by reference thereto) allows auser to prioritize data replication actions on specific files based onfile designations such as “non-replicated”, “immediately replicated” or“lazily replicated.” However, such configuration only addresses systemperformance needs for short lifetime data storage systems, and does notaddress the needs for system integrity and accident recovery.

Patent application WO 02/25445 to Kamel, the content of which isincorporated herein by reference thereto, discloses a method and systemfor electronic file lifecycle management. Similar applications are alsocalled Hierarchical Storage Management (HSM) applications. FileLifecycle management and HSM software move files based on rules betweendifferent storage systems. The system might also create multiple copieson different storage systems if the defined rules or policies define thelifecycle of a file accordingly.

Given the current interrelationship of data stored on networks, what isneeded therefore is a way of ensuring that deleted data on devices thatare not backed up may be recovered as long as a user wishes to preservethe ability to restore data including references to the deleted data ofsuch devices from backups.

What is needed is a user-controlled replication system forfunction-specific replication of data manipulating functions that allowsusers to control both whether and when replication of data manipulatingfunctions occurs.

What is needed is a system or method that allows synchronizing orconfiguring the time frame within which a data restore is possible froma target storage system and which enables replicating data manipulatingfunctions performed on object or file based storage arrays.

Further, what is needed is a system which more fully addresses the needsfor system high availability, integrity and accident recovery.

SUMMARY OF THE INVENTION

The system and method of the invention provides for function-specificreplication for data manipulating functions of digital data, such asfiles or objects, with a configurable time delay for each function to bereplicated. The system includes a source storage system from which adata manipulating function is to be replicated, a destination storagesystem(s) to which the replicated function on digital data is beingreplicated to and a replication management module for managing thefunction specific replication delay and the function replication betweenthe source storage system(s) and the destination storage system(s).

The replication management module of the invention providesfunctionality allowing: (1) configuration of a delay after which a datamanipulating function will be performed on the destination storagesystem when data stored on the source storage system, modified orcreated by the function, is replicated on corresponding data on thedestination storage system; (2) the replication of the data manipulatingfunction performed on data stored on the source storage system with theconfigured delay to the destination storage system; and (3) querying offunction-specific changes to data of the source storage system in agiven timeframe.

It is an object of the invention to provide a system and method whichmeets the business need of combining both data replication for highavailability and disaster tolerance as well as recoverability of data incase of logical errors.

It is another object of the present invention to provide a system andmethod for function specific replication of data manipulating functionson digital data that is adaptable to a wide range of storage systemarchitectures, including object-based storage arrays having anapplication programming interface, file-based storage arrays, andstandard computer servers.

It is a further object of the present invention to provide a system andmethod for function specific replication of data manipulating functionson digital data that can be implemented in hardware abstraction andvirtualization software.

It is yet a further object of the present invention to provide a systemand method for function specific replication of data manipulatingfunctions on digital data that is easily scalable to several and even alarge number of destination storage systems.

It is an object of the invention to provide a system and method whichreplicates the data manipulating function itself and not the datachanges.

In an advantage, the system and method solves the business need ofcombining both data replication for high availability and disastertolerance as well as providing recoverability of data in case of logicalerrors.

In another advantage, the combination of object or file replication fordisaster tolerance with the ability to configure the delay of thereplication for each function that can be performed on the storedobjects or files provides both disaster tolerance and the ability ofrecovering from logical errors.

In another advantage, the method makes replication of data manipulatingfunctions dependent on the function that was performed on the data aswell as makes the delay of the replication time-configurable, in thatthe replication of new objects or files can be performed as quickly aspossible but the replication of another function like deletes of objectsor files may be delayed for a configurable amount of time, therebyproviding a solution for both disaster tolerance and logical errorrecovery. This allows the customer to ensure that data on storage arraysthat is not backed up is recoverable for the same time that a restoreand recovery of references to these objects or files is possible. Suchsystem thus guarantees that all objects and files are available forrecovery as long as references to that data may be restored frombackups.

In another advantage, the system and method of the invention delays thedeletion of data from the source storage array for a N period until thedata is also deleted from the target storage array, thereby allowing therestoring of an application database using the standard recoveryprocedure as well as providing the possibility of accessing thepreviously deleted data on the secondary storage array without having tohave a complete backup of all data having ever been written to thesource storage array. Once the standard recovery procedure is no longercapable of restoring and recovering references to data, the file orobject referenced can also be deleted on the target storage array.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a block-based storage system of theprior art where the replication management module is located in thesource storage array.

FIG. 2 is a schematic diagram of an object or file based storage arrayof the prior art where the replication management module is implementedin a separate system.

FIG. 3A and FIG. 3B are schematic diagrams showing the elements of thesystem for function specific replication of data manipulating functionson digital data with a configurable time delay, where the replicationmanagement module is located on the source storage system.

FIG. 4 is a schematic diagram showing the elements of the system forfunction-specific replication of data manipulating functions with aconfigurable time delay, where the replication management module islocated between the application or user and the source and thedestination storage system, thereby providing access to the storagesystems.

FIG. 5 is a schematic diagram showing the elements of the system forfunction specific replication of data manipulating functions on digitaldata with a configurable time delay, having several destination storagesystems.

FIG. 6 is a schematic diagram showing the elements of the system forfunction specific replication of data manipulating functions on digitaldata with a configurable time delay, having several source storagesystems.

FIG. 7 is a flow chart showing the necessary main steps to implement afunction-specific function replication system and method of the presentinvention.

FIG. 8 is a flow chart showing the steps of the information gatheringprocess of the invention for proprietary storage systems of a firstclass of storage arrays, such class not allowing the querying of thearray for changes that were made to the objects or files that are storedon the array.

FIG. 9 is a flow chart showing the steps for implementing thereplications monitoring process of the invention for proprietary storagesystems of a first class of storage arrays for which the task ofreplication monitoring requires the creation of a replication monitoringdatabase.

FIG. 10 is a flow chart showing the steps for implementing thereplications monitoring process of the invention for a second class ofstorage arrays, such class not requiring the creation of the replicationmonitoring database.

FIG. 11 is a flow chart describing the steps necessary to maintain aconsistent set of objects or files on the target storage array.

FIG. 12 is a flow chart showing the steps for implementing the delayedfunction-specific replication of data manipulating functions for a firstclass of storage arrays based on the replication monitoring database.

FIG. 13 is a flow chart showing the steps for implementing the delayedfunction-specific replication of data manipulating functions for asecond class of storage arrays that do not require the replicationmonitoring database.

FIG. 14 is a schematic representation of the configuration table of theinvention.

FIG. 15 is a schematic representation of the Source Change Table of theinvention.

FIG. 16 is a schematic representation of the Outstanding ReplicationsTable of the invention.

FIG. 17 is a schematic representation of the Replication Audit Table ofthe invention.

FIG. 18 lists examples of different customer requirements and how theyare implemented in a configuration table.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Referring now to FIG. 1, a block-based source storage system 60 of theprior art provides a server 80 access to a certain disk capacity. Theoperating system installed on server 80 possesses the knowledge of whereand which object or file lies within this disk capacity. Thisinformation can, for example, be stored in the File Allocations Table orI-nodes. An application or User 90 accessing a file on such a server 80would therefore issue any function-based calls like write, update anddelete to that server 80 who in turn knows where the file is located onthe block-based source storage system 60. Any function performed by anapplication or user 90 will result in an update or read of a block onthe disk capacity available to the server 80. The replication of writesor update of a block on the source storage array 60 is embodied in thesource storage system 60.

Referring now to FIG. 2, object or file based storage arrays 65 and 75,respectively, provide the functionality of the server 80 mentioned abovedirectly from within the storage array 65 and 75. The application oruser 90 accessing a file issues the functions directly to the storagearray. For the purpose of abstraction, a server 80 providing file basedaccess to the available disk capacity on source storage array 65 is alsocontained in the file based storage arrays because whether he accessesthe server or the storage array is not differentiable by an applicationor user. To the application or user, they both provide the samefunctionality of file level access using file access protocols like CIFSor NFS. The replication from the file or object based source storagesystem 65 to the corresponding target storage array 76 is embodied inthe source storage system 65

Referring now to FIGS. 3A to 6, a system 10 for function specificreplication of data manipulating functions 12 on digital data, such asfiles or objects, allows for a configurable time delay 14 for eachfunction to be replicated. The system 10 includes a source storagesystem 20 from which performed data manipulating functions on data arereplicated, at least one destination storage system 30 to whichperformed data manipulating functions are replicated to, a replicationmanagement module 40 for managing the function specific replicationdelay and the replication of data manipulating functions between thesource storage systems and at least one destination storage system,optionally comprising a replication monitoring database 42.

The system 10 provides replication for at least one standard datamanipulating function of a group of functions including: write, delete,update, modify, write-disable, write disable until expiration date,delete-disable and delete-disable until expiration date.

The replication management module 40 provides several novel features.One feature allows for the configuration of a delay after which aspecific data manipulating function on data stored on the source storagesystem is replicated on corresponding data on the destination storagesystem. Another feature allows for replication of the data manipulatingfunction performed on data stored on the source storage system with theconfigured delay to the destination storage system. Still anotherfeature allows for querying function-specific changes to data of thesource storage system in a given timeframe.

As for the source storage system 20 for replicating data manipulatingfunctions on digital data, at least one destination storage system 30 isbased on one of the following architectures: object-based storage arrayscomprising an application programming interface, file-based storagearrays or a computer server, comprising memory 36, a CPU 38 and anoperating system 39.

The system 10 may directly provide access to storage systems based oneither of the following architectures: object-based storage systemshaving an application programming interface 34, file-based storagearrays, and a computer server 80, including memory 36, a CPU 38 and anoperating system 39 as shown in FIG. 5.

The system 10 is adaptable to several different system configurations.Referring now to FIG. 3A, a configuration where the replicationmanagement module 40 is located on the source storage system 20 isshown. The information about functions performed by applications orusers 90 on objects or files stored is gathered by the replicationmanagement module from the source storage system 20 and used toreplicate each data manipulating function with a configurable delay tothe Destination Storage system 30. The information gathered mayoptionally be stored for future reference in the replication monitoringdatabase 42.

Referring again to FIG. 4 a configuration where the replicationmanagement module 40 is located between the application or user 90 andthe source and destination storage systems 20 and 30 is shown.

Referring now to FIG. 5, a configuration is shown with severaldestination storage systems 30, one being a secondary destinationstorage system 32. The Replication management module 40 gathers theinformation for function-specific replication of data manipulatingfunctions from the Source storage system 20 and replicates to multipleDestination Storage systems 30. A Destination storage system 30 may beused by a second Replication management module as the source storagesystem to replicate to a secondary destination storage system 32.

Referring now to FIG. 6, a configuration with several source storagesystems 20 is shown. One replication management system 40 is gatheringinformation from multiple source storage systems 20. All datamanipulating functions performed on multiple source storage systems 20are replicated to a common destination storage system 30.

The source storage system 20 or the destination storage system 30 arefile-based storage arrays, including a server 80 which enables filebased access to the available storage capacity of the storage array.

The method 100 for implementing a function-specific replication of datausing system 10, as shown in FIG. 7, involves three parallel functionsto be performed continuously in parallel or based on a schedule:Gathering information 120, Pending replications monitoring 140 andDelayed function-specific data replication 160.

FIG. 8 shows the gathering of information 120 required for thereplication of data manipulating functions that are performed on datastored on a source storage system and replicated to a target storagesystem. This is achieved by:

-   -   running an information gathering process 122 using information        gathering software,    -   building a replication configuration database 123 including        information on the data manipulating functions to be replicated,        the source and target storage system, and    -   launching the pending replications monitoring process 140

The running of an information gathering process 122 includes thesubsteps of:

-   -   inserting information from the replication configuration        database 123 for the function-specific delayed replication of        data manipulating functions in the configuration table 22 of the        replication monitoring database, directly from the information        gathering software,    -   wherein the information that the information gathering software        inserts into the database are:        -   the definition of the source storage array 124,        -   the definition(s) of the target storage array(s) 125,        -   the data manipulating function 126 to be replicated,        -   the priority 127 of the specified function,        -   the delay 128 after which the specified function is            replicated, and        -   optionally the definition of a modifier 129 for more            granular function-specific replication.

The priority and the delay are correlated to each other to ensureconsistency in the target environment. A typical priority order wouldassume that new objects created with a write function are of highestpriority, changes performed with the update function are of mid-levelpriority and delete functions of lowest priority. The consequences arethat the highest priority delay must be assigned the shortest delay andthe lowest priority the longest one. Priority and corresponding delaytimes 14 are required to ensure the consistency of the target objects orfiles.

Consistency between source and target storage arrays with respect toreplication of data manipulating functions is defined at the datamanipulating function level and the integrity rules are defined bybusiness criteria based on the objects to be achieved. There are manydifferent requirements that can be addressed with the current invention.If high availability and disaster recovery are the main requirements,all data manipulating functions (eg.create/write, update and delete)would be associated with a high priority and short delay. Ifrecoverability is the main objective, the priority for data creationfunctions would be high. Data changes could be of medium priority anddelay and most important data deletion functions would be replicatedwith lowest priority and longest delay. The delay would be configured tobe as long as the required recoverability period.

There may be compliance reasons that require a change in priorities fora replication of data manipulating functions. If an employee who leavesa company requests his employer to delete his personal data according tolocal law, the current invention is able to handle this. In such asituation, the subset of data is configured to replicate deletionfunctions for this employee with highest priority and then the personaldata is deleted. This would remove all pending write or update functionsfrom the replication. In this case, the business requirement is tocomply with compliance regulations and not to ensure recoverability orhigh availability.

In order to make the current invention suitable for today's changingbusiness requirements, preference has been given to the implementationof a priority parameter that allows for the validating of the delay tocorrespond with the priority of the function. Of course, otherimplementations are possible should a skilled person be given thisapplication and be asked to use its teachings to derive otherimplementations.

Referring now to FIG. 18, Table 26 lists, by way of example, differentcustomer requirements and their implementation in configuration tables.

For storage systems that provide information on the authors of changes,data about the originating applications or users, the replicationmanagement module can be used to further specify the granularity onwhich the function-specific data replication should act. For example,the module would allow the replication of delete functions from a SECcompliant application as quickly as possible to ensure that content isdeleted once it is permissible under the SEC rules to do so and to delaythe replication of a delete function from a file archive applicationthat does not fall under regulatory requirements. This behaviour isspecified using the modifier 129 entry in the configuration table.

For file-based storage arrays, a differentiation based on a part of theUNC path may provide similar functionality. Application functionsperformed by accessing the share \\server1\share1 can be replicateddifferently than functions performed by users accessing \server1\share2or \\server2\share1.

The pending replications monitoring process 140 is a monitoring processfor pending replications, which watches for outstanding replications andpasses them to the process who does the actual function replication. ThePending replications monitoring periodically queries the source systemfor changes and inserts them into the database of what has happened onthe source (the source change table). In simpler variations this justcreates a list of objects if the source array allows querying based ontimeframe and function performed)

For source storage arrays 20 that do not allow sending event-basedinformation of the functions performed, the interval in which thepending replications monitoring takes place must be specified.

The inputs into the system 10 and method 100 of the inventionimplementing the function-specific replication of data manipulatingfunctions 12 are gathered in a Graphical user interface 19 and stored inthe replication monitoring database configuration input table 22. Whenreplicating data manipulating functions between storage systems that donot require a replication monitoring database, the requiredconfiguration information may be provided in a configuration file. Thisfile may be created using a Graphical user interface or by editing theconfiguration file in a text editor.

The possibility of specifying more than one destination storage system30 also allows replicating functions with a different delay for eachtarget system.

In order to implement function-specific replication including aconfigurable time delay 14, the pending replications monitoring processmust provide a means for monitoring pending replications and fordetermining the delay 14 or exact time 16 to replicate the datamanipulating function. The replication time 16 to replicate a functionalchange may be stored in the replication monitoring database 42 and willbe used by the pending replications monitoring process 140 and thedelayed function-specific data replication process 160.

Tracking of which function was performed on a storage array is dependanton the functionality that the specific storage array provides. Thefunctionality of the storage array also defines the granularity that canbe provided to the user of the application.

The existence of the replication monitoring database 42 with all of therequired information stored allows changing the delay with which thereplication of a data manipulating function should be performed. Thereplication time in the outstanding replications table 18 can be changedfor data manipulating functions that are not yet replicated. The pendingreplications monitoring process 140 takes into account the changedreplication time to initiate the delayed function-specific replicationof data manipulating functions 160. Depending on the environment, itallows increasing or decreasing of the delay based on the customer'sactual needs. Based on the information that can be queried from thestorage systems, the delay might also be configured independently foreach application, file system or subset of objects.

The implementation of the system and method for function-specificreplication of data manipulating functions requires different versionsof the software. Function-specific replication between standard serversrunning standard operating systems cannot be implemented the same way asreplication between proprietary API-based storage systems. Furtherdetail is provided below of the different functions that need to bepresent dependant of the storage array.

The replication monitoring database 42 must be configured for eachsource storage system, notably with regard to identification of theinformation to be gathered and tracked, so as to enable the correct andconsistent replication of the data manipulating function of the presentinvention to be used. As an example: An object based storage system doesnot require the same information as a file based storage system for thereplication of data manipulating functions.

The required information is condensed into the least amount of datanecessary to implement a function-specific and time delayed replicationof data manipulating functions.

In order to reduce the complexity of today's storage environments, thevirtualization of infrastructures is rapidly being adopted in themarket. Storage virtualization software abstracts the physical storagesystems into logical entities. Virtualization is a good way to maskongoing migrations or systems being replaced with newer ones.Virtualization software thus knows which function is being performed onwhich file or object. The method of the present invention, inparticular, the replication features thereof, can be implemented in avirtualization layer that provides direct access to source or targetstorage systems. The system of the present invention can directlyprovide access to source and target storage systems as shown in FIG. 4.

The way the function-specific information is retrievable from a storagearray depends on the functionality that is implemented on a storagearray. It also depends on other functionality aspects like the abilityto install and run a process on that storage array.

Today, object or File oriented storage arrays are built based on twodifferent approaches.

In a first approach, file oriented storage may be implemented usinghardware that provides file level access based on standard operatingsystems. These operating systems, such as UNIX, Linux or Windows, allowthe installation of additional software that can be used to facilitatethe implementation of the present invention.

In a second approach, object and File oriented storage arrays may beimplemented using proprietary operating systems like Data “ONTAP”,“DART” or “CENTRASTAR”. To allow maximum flexibility in changing thetime delay for the function-specific replication of data manipulatingfunctions, all detected performed data manipulating functions aregathered as quickly as possible. This means that a deletion of contentis recorded once it is discovered by the pending replications monitoringprocess. This ensures that increasing or decreasing configured delaysreplicates all outstanding data manipulating functions even when changesare made to the replication delay. It allows updating the replicationmonitoring database with a new time of replication for allfunction-specific replications of data manipulating functions not yetcompleted.

Standard Operating System Based Storage

Standard Operating systems based storage allows the installation andcreation of additional software and services on the server that providesthe storage services in the network. The pending replications monitoringprocess 140 runs as such a process on the storage array resp. storageserver. Changes in the file systems may either be intercepted ordetected and the required information for the function-specific delayedreplication of data manipulating functions may be inserted in thedatabase source change table directly from the pending replicationsmonitoring process.

The whole system or an implementation of the method of the presentinvention may run on a standard operating system based storage server orstorage array.

Proprietary Storage Systems

The implementation of the pending replications monitoring process forproprietary storage systems must at least provide the function-specificinformation for the process 160 for delayed function-specificreplication of data manipulating functions. There are two generalapproaches that need to be differentiated depending on the class of thestorage array.

A first class of storage arrays does not allow querying the array forchanges that were made to the objects or files that are stored on thearray. In this situation, the pending replications monitoring process140 of the system implementing the function-specific delayed datareplication is described in FIG. 9. Referring to FIG. 11, the process tomaintain consistency in the replication of data manipulating functionsis described. FIG. 12 describes the replication of data manipulatingfunctions.

In a second class of storage arrays, the task of the pendingreplications monitoring process 140 does not require the creation of anadditional database. The pending replications monitoring process asdescribed in FIG. 10 continuously, or in a scheduled way, queries thesource storage arrays for changes made to objects or files based on thefunction to be replicated and additional information such as when or whoperformed the function. FIG. 13 shows the delayed function-specificreplication of data manipulating functions for the second class ofstorage arrays.

A good example in the category of object-based storage systems with thisquery functionality is “EMC CENTERA”, described at http URIsemc.com/products/family/emc-centera-family.htm, the content of which,including content in links therein. The Query API allows the listing ofcontent based on a timeframe the query is targeted to. The default querywould provide a list of objects that may be used to find out when theobject was written and who created it. With the same queryfunctionality, the information gathering process 122 can determine whichobjects were deleted in order to replicate the delete function with theconfigured delay. The available proprietary storage systems todayalready provide replication functionality based on independent softwareor software installed on the storage arrays. The implementation of afunction-specific delayed replication on the storage systems hasheretofore not been implemented.

Now referring to FIG. 9, the pending replications monitoring process 140for the first class of storage arrays requiring a replicationsmonitoring database is built in two steps:

-   -   (1) running the pending replications monitoring process 142        continuously or based on a schedule; and    -   (2) using information gathered in step 150, the outstanding        replications maintenance process, to ensure consistency in the        replication process

The pending replications monitoring process 140 is made up of thefollowing substeps which build the replications monitoring databasesource change table:

-   -   inserting the function-specific information required to        replicate data manipulating functions 143;    -   adding the Source information 144 for the function to be        replicated, made up the source storage system as well as the        reference for the file or object the function was performed on;    -   inserting the function 145 that was performed on the referenced        source file or object;    -   specifying the following:        -   date and time 146 the function was performed; and            optionally,        -   the modifier 147 who performed the function, and if            required,        -   the before and after image 148 required to perform the            function with the configured delay on the target storage            system

Referring now to FIG. 10, the pending replications monitoring process140 is provided for the second class of storage arrays which does notrequire a replications monitoring database. The pending replicationsmonitoring is implemented in two stops:

-   -   (1) listing the files or objects depending on the function to be        replicated with the configured delay 149, and    -   (2) passing this information to the delayed function-specific        replication of data manipulating functions 160

Referring to FIG. 11, the outstanding replications maintenance process150 ensures the maintenance of a consistent set of files or objects onthe target storage array. The maintenance of the replications monitoringdatabase requires two steps:

-   -   (1) the outstanding replications maintenance process 152        implemented with several substeps detailed below, and    -   (2) once the consistency of the functions to be replicated is        ensured, launching the delayed function-specific replication of        data manipulating functions 160 for the class of storage arrays        requiring the replications monitoring database.

The maintenance process 150 itself consists of the steps:

-   -   (1) using the outstanding replications monitoring process 153,        checking the source change table for newly arrived functions to        be replicated,    -   (2) inserting non-completed functions to be replicated in the        outstanding replications table 154 with the required information        to perform the change;    -   (3) determining whether the new function needs to be replicated        155, as, dependent on the source, reference and priority of the        function it is decided if the function is replicated;    -   (4) if the function is replicated, it is inserted into the        outstanding replications table 156 together with the target,        reference, function and replication time for the data        manipulating function to be replicated;    -   (5) the ensuring of the consistency of non-completed data        manipulating functions is accomplished in step 157, wherein, if        the new function has a higher priority and a shorter delay than        other pending functions in the outstanding replications table        for the same source and reference, this is accomplished by        removing the already pending replications from the outstanding        replications table and maintaining only the new function with        higher priority;    -   (6) for all functions, updating the source change table 158 with        the information that the corresponding function has completed        it's maintenance step;    -   (7) now having ensured a consistent set of outstanding        replications, invoking the delayed function-specific replication        of data manipulating functions 160.

Referring now to FIG. 12, in which the delayed function-specificreplication of data manipulating functions 160 for the first class ofstorage arrays, the delayed function-specific replication process 162 ismade up of several steps, including:

(1) performing the delayed replication using the data manipulatingfunction replication process 164;

(2) querying all pending functions to be replicated since the lastinvocation of the process from the outstanding replications table 165with a replication time prior to the current time;

(3) performing the data manipulating function on the target storagearrays files or objects 166; and

(4) completing the functional replication updated in the replicationmonitoring database outstanding replications table 167, optionallyinserting the completion of the replication of the data manipulatingfunction in the replication audit table 168 for audit purposes.

Referring now to FIG. 13, the delayed function-specific replication ofdata manipulating functions is performed for storage arrays with queryfunctionality not requiring the maintenance of a replications monitoringdatabase. The delayed functions-specific replication process 170 invokesthe data manipulation function replication process 171 continuously orbased on a schedule. The list of objects or files is available from thereplications monitoring process 149 based on the function to bereplicated with the configured delay. In step 172, the function isperformed on the target storage array's objects or files.

The function-specific time-configurable replication of data manipulatingfunctions for the first class of storage arrays requires a replicationmonitoring database that provides all the information required toimplement a function-specific delayed replication in a consistentmanner.

The minimum information that must be available to implement a functionalfunction-specific replication of data manipulating functions is found inthe detail description of the four tables below.

Configuration Table 22 (FIG. 14)

-   -   Source: the source object or file based storage system that the        function was performed on.    -   Targets: derived from the configuration input the target storage        systems for each source target system.    -   Function: function to be replicated.    -   Priority: priority of the specified function    -   Delay: delay for the specified function.    -   Modifier: provides possibility to add a higher degree of        granularity for the replication of data manipulating functions.        Examples of different business requirements and their        implementation in a configuration table are listed in in Table        26 in FIG. 18.

Source Change Table 24 (FIG. 15)

-   -   Source: source object or file based storage system that the        function was performed on.    -   Reference: object or file (UNC) reference that the function was        performed on.    -   Function: function that was performed.    -   Time: date and time the function was performed on the object or        file.    -   Modifier: additional information like application, user, part of        the UNC path to provide a more granular data replication.    -   Completed: once a functional change has been treated by the        outstanding replications maintenance process the completion is        stored in the Source Change Table. Simple yes/no flag for quick        rebuilds of the outstanding replication table.    -   Before/After image: All information required for the replication        of the functional change on the object or file if necessary for        the class of storage array.

New entries in the Source Change Table might trigger a function thatinserts the corresponding entry or entries in the outstandingreplication table. This outstanding replications maintenance process 150may also run continuously or based on a schedule to update theoutstanding replication table.

Outstanding Replication Table 18 (FIG. 16)

The replication table is based on the configuration table and theinserts into the source change table. Changes in the configuration mayrequire the replication table to be rebuilt for entries in the sourcechange table that were not yet completed

-   -   Target: target system the replication needs to be replicated to.    -   Reference: object or file (UNC) reference that the function will        be replicated to.    -   Function: function to be replicated.    -   Replication Time: date and time at which the function needs to        be replicated.    -   Completion: date and time the replication has been performed.        An update with a Completion might trigger a function that        creates an insert with the required information in the        Replication Audit table

Replication Audit Table 24 (FIG. 17)

The audit table provides a means of proving which replications havealready been performed.

-   -   Source: source object or file based storage system that the        function was performed on.    -   Reference: object or file (UNC) reference that the function was        performed on.    -   Function: function that was performed.    -   Time: date and time the function was performed on the object or        file.    -   Modifier: additional information like application, user, part of        the UNC path to provide a more granular data replication.    -   Target: target system the replication needs to be replicated to.    -   Replication Time: date and time the function needs to be        replicated by.    -   Completion: date and time the replication has been performed.

The delayed function-specific replication of data manipulating functions160, for delaying a deletion of data from the source storage systemuntil the data is also deleted from the destination storage system, isachieved by configuring the delete function with the lowest priority andthe longest delay used for the function-specific replication of datamanipulating functions.

Generally, the delayed function-specific replication of datamanipulating functions may run continuously or based on a schedule. Inthe scheduled way the replication is initiated in regular intervals atspecific times. Every time this interval expires, the pendingreplications monitoring process updates the source change table withnon-replicated data manipulating functions, the outstanding replicationsmaintenance process is run if the replication takes place betweenstorage arrays requiring the replication monitoring database and thedelayed function-specific replication of data manipulating functions isrun for functions with an expired delay.

The delayed function-specific replication of data manipulating functionsneeds to follow the same directions as described in the informationgathering and pending replications monitoring sections herein. Thereplication needs to be implemented for Standard Operating system basedstorage arrays differently than for proprietary storage systems anddepends on the functionality of the storage arrays to be supported. Inan example of an Operating System based storage array that provides filesharing the function-specific replication would in case of a write oil\\Server1\Share1\File1.doc create the new file on the target storagearray under \\Server2\Share5\File1.doc. In case of a proprietary storagearray like EMC Centera the function-specific replication would readobject FGLSO3eJ90S2 from source storage array reachable at IP Address192.168.2.1 and create the same object FGLSO3eJ90S2 on the sourcestorage array at IP Address 156.172.50.33. In case of the OperatingSystem based replication the replication involves standard file systemoperations and in the case of EMC Centera the function-specificreplication needs to integrate the API required to access the source andtarget storage array.

In an advantage, the method makes replication of data manipulatingfunctions dependent on the function that was performed on the data aswell as makes the delay of the replication time-configurable, therebyproviding a solution for both disaster tolerance and logical errorrecovery. This allows the customer to ensure that data on storage arraysis recoverable for the same time that a restore and recovery from theproduction references of the objects or files is possible. Such systemthus guarantees that all objects and files are available as long asreferences to that data may be restored from backups.

In another advantage, the system and method of the invention can extendexisting function-specific replications without configurable delay byreplicating some data manipulating functions with a specified delay. Asan example, the replication between a source and a destination storagearray would continue to replicate write functions but the replication ofdelete functions from the source storage array would be delayed usingthe current invention for a N period until the data is also deleted fromthe target storage array, thereby allowing the restoring of anapplication database using the standard recovery procedure and wouldthus provide the possibility to access the previously deleted data onthe secondary storage array without having to have a complete backup ofall data having ever been written to the source storage array. Once thestandard recovery procedure is also no longer capable of recoveringdata, the file or object referenced can also be deleted on the targetstorage array by the delayed function-specific replication of datamanipulating functions.

The patents and articles mentioned above are hereby incorporated byreference herein, unless otherwise noted, to the extent that the sameare not inconsistent with this disclosure.

Other characteristics and modes of execution of the invention aredescribed in the appended claims.

Further, the invention should be considered as comprising all possiblecombinations of every feature described in the instant specification,appended claims, and/or drawing figures which may be considered new,inventive and industrially applicable.

Multiple variations and modifications are possible in the embodiments ofthe invention described here. Although certain illustrative embodimentsof the invention have been shown and described here, a wide range ofmodifications, changes, and substitutions is contemplated in theforegoing disclosure. While the above description contains manyspecifics, these should not be construed as limitations on the scope ofthe invention, but rather as exemplifications of one or anotherpreferred embodiment thereof. In some instances, some features of thepresent invention may be employed without a corresponding use of theother features. Accordingly, it is appropriate that the foregoingdescription be construed broadly and understood as being given by way ofillustration and example only, the spirit and scope of the inventionbeing limited only by the claims which ultimately issue in thisapplication.

1. A system (10) for function specific replication of data manipulatingfunctions performed on files or objects stored on a source system (20,65) and to be backed-up on at least one destination storage system (30,75), the replication system comprising: a replication management module(40) for managing consistent replication of data manipulating functions(12) from the source storage system (20, 65) to the destination storagesystem (30, 75), including replication of data manipulating functions(12) between the source storage system (20, 65) and the at least onedestination storage system (30, 75), optionally comprising a replicationmonitoring database (42), the system (10) characterised in that themanaging of replication includes replication of functions (12) with aconfigurable time delay (14) for each function to be replicated.
 2. Thereplication system (10) of claim 1, wherein the replication system isadapted to replicate data manipulating functions (12) after receiving acommand function selected from a group of functions consisting of write,delete, update, modify, write-disable, write disable until expirationdate, delete-disable and delete-disable until expiration date.
 3. Thereplication system (10) of claim 1 wherein the replication managementmodule (40) provides functionality allowing: configuration of a delay(14) after which a specific data manipulating function (12) performed ondata stored on the source storage system (20, 65) is replicated oncorresponding data on the destination storage system (30, 75),replication of the data manipulating function (12) performed on datastored on the source storage system (20, 65) with the configured delay(14) to the destination storage system (30, 75), and queryingfunction-specific changes on data of the source storage system (20, 65)in a given timeframe.
 4. The replication system (10) of claim 1, whereinthe storage system (20, 65, 30, 75) is based on one of a group ofarchitectures consisting of: object-based storage arrays (60) comprisingan application programming interface (34), file-based storage arrays(60), and a computer server (80), comprising memory (36), a CPU (38) andan operating system (39).
 5. The replication system (10) of claim 1,wherein the instructions of the replication management module (40) arestored on one of either the source storage system (20, 65) or thedestination storage system (30, 75).
 6. The replication system (10) ofclaim 1, wherein the replication management module (40) is configured toprovide access to storage systems (20, 65, 30, 75) based on one of agroup of architectures consisting of: object-based storage systems (60)comprising an application programming interface, file-based storagearrays (60), and a computer server (80), comprising memory (36), a CPU(38) and an operating system (39).
 7. A computerized method (100)encoded on a computer readable medium (36), the method (100) managingconsistent replication of data manipulating functions between a sourcestorage system (20, 65) and at least one destination storage system (30,75), the method comprising instructions for: (a) configuration of adelay (14) after which a specific data manipulating function (12)performed on data stored on the source storage array (20, 65) will bereplicated to data stored on the destination storage array(s) (30, 75);(b) gathering information (120) on functions (12) that were performed ondata stored on a source storage system (20, 65), optionally includingthe step of building a replication monitoring database (42) includinginformation on the functions (12) that were performed on data stored ona source storage system (20, 65); (c) querying the replicationmonitoring database (42) on the replication time (16) for outstandingdata manipulating functions (12′) to be replicated by running a pendingreplications monitoring process (140); and (d) replicating the datamanipulating function (12) performed on the source storage system (20,65) to the destination storage system(s) (30, 75).
 8. The method (100)of claim 7, wherein the replication monitoring process (140) comprisesconfiguring a query for a function-specific replication of datamanipulating functions (12′) on a per function basis, using an inputtable (22) accessible to the user (90) via a user interface (19),comprising the steps of: (1) defining a source storage system (20, 65)and at least one destination storage system (30, 75), (2) listing thedata manipulating functions (12) to be replicated between source anddestination storage system, (3) specifying a function-specific delay(14) for each function (12) and relationship of source to destinationstorage system (30, 75), (4) specifying the frequency (26) at which thereplication monitoring database (42) is queried for outstandingreplications of data manipulating functions (12′) to be sent to thefunction replication processes (160), (5) delaying function-specificreplication of data manipulating functions (12), including the sub-stepsof configuring the time delay (14) used for the function-specificreplication of data manipulating functions, and specifying a functionreplication delay (14), thereby delaying execution of a function untilpredetermined conditions are met.
 9. The method (100) of claim 7,wherein the source storage system (20, 65) is a storage array (65)comprising an operating system (39) that provides file level access todata, from which information on functions (12) that were performed ondata can be obtained, and which stores self-installing informationgathering software encoded with instructions for executing aninformation gathering process (122) allowing for installation andrunning on a client computer.
 10. The method (100) of claim 9, whereinthe step (122) of gathering information comprises the substeps of:inserting information for the function-specific delayed replication ofdata manipulating functions (12) in a source change table (24) of areplication monitoring database (42), directly from the informationgathering software, wherein the information to be inserted into thedatabase (42) by the information gathering software includes: a filereference (144) in form of the UNC path to the file, the function (12)that was performed on the file, date and time the function wasperformed, and optionally, the modifier (129) that performed thefunction, and a before and after image (148) of the object or filemodified by the function.